Overview
Brought to you by YData
Dataset statistics
| Number of variables | 18 |
|---|---|
| Number of observations | 50000 |
| Missing cells | 9625 |
| Missing cells (%) | 1.1% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 26.3 MiB |
| Average record size in memory | 551.4 B |
Variable types
| Text | 2 |
|---|---|
| Numeric | 11 |
| Boolean | 1 |
| Categorical | 4 |
POSSIBLENterm has constant value "True" | Constant |
Insidesource has constant value "TMHMM2.0" | Constant |
TMhelixsource has constant value "TMHMM2.0" | Constant |
Outsidesource has constant value "TMHMM2.0" | Constant |
ExpnumberofAAsinTMHs is highly overall correlated with Insideend and 5 other fields | High correlation |
Insideend is highly overall correlated with ExpnumberofAAsinTMHs and 5 other fields | High correlation |
Insidestart is highly overall correlated with ExpnumberofAAsinTMHs and 4 other fields | High correlation |
Length is highly overall correlated with Insideend and 1 other fields | High correlation |
Outsideend is highly overall correlated with Length and 3 other fields | High correlation |
Outsidestart is highly overall correlated with ExpnumberofAAsinTMHs and 4 other fields | High correlation |
PredictedTMHsNumber is highly overall correlated with ExpnumberofAAsinTMHs and 5 other fields | High correlation |
TMhelixend is highly overall correlated with ExpnumberofAAsinTMHs and 6 other fields | High correlation |
TMhelixstart is highly overall correlated with ExpnumberofAAsinTMHs and 6 other fields | High correlation |
POSSIBLENterm has 9625 (19.2%) missing values | Missing |
Protein_ID has unique values | Unique |
Expnumberfirst60AAs has 2544 (5.1%) zeros | Zeros |
Reproduction
| Analysis started | 2025-07-21 08:42:10.988390 |
|---|---|
| Analysis finished | 2025-07-21 08:42:26.196952 |
| Duration | 15.21 seconds |
| Software version | ydata-profiling v0.0.dev0 |
| Download configuration | config.json |
Variables
Phage_ID
Text
| Distinct | 46765 |
|---|---|
| Distinct (%) | 93.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.2 MiB |
Length
| Max length | 87 |
|---|---|
| Median length | 86 |
| Mean length | 23.76738 |
| Min length | 6 |
Unique
| Unique | 43801 ? |
|---|---|
| Unique (%) | 87.6% |
Sample
| 1st row | MGV-GENOME-0377366 |
|---|---|
| 2nd row | MGV-GENOME-0228589 |
| 3rd row | TemPhD_cluster_54944 |
| 4th row | TemPhD_cluster_21940 |
| 5th row | uvig_280215 |
| Value | Count | Frequency (%) |
| uvig_82024 | 5 | < 0.1% |
| uvig_26080 | 4 | < 0.1% |
| mgv-genome-0372934 | 4 | < 0.1% |
| mgv-genome-0379973 | 4 | < 0.1% |
| uvig_183868 | 4 | < 0.1% |
| uvig_134152 | 4 | < 0.1% |
| uvig_186748 | 4 | < 0.1% |
| temphd_cluster_6638 | 4 | < 0.1% |
| mgv-genome-0379883 | 4 | < 0.1% |
| mgv-genome-0379887 | 4 | < 0.1% |
| Other values (46755) | 49959 |
Most occurring characters
| Value | Count | Frequency (%) |
| _ | 114246 | 9.6% |
| 1 | 61386 | 5.2% |
| 0 | 52867 | 4.4% |
| 3 | 50156 | 4.2% |
| 2 | 49524 | 4.2% |
| E | 41171 | 3.5% |
| 4 | 40816 | 3.4% |
| 5 | 39822 | 3.4% |
| M | 37577 | 3.2% |
| 7 | 37163 | 3.1% |
| Other values (55) | 663641 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1188369 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| _ | 114246 | 9.6% |
| 1 | 61386 | 5.2% |
| 0 | 52867 | 4.4% |
| 3 | 50156 | 4.2% |
| 2 | 49524 | 4.2% |
| E | 41171 | 3.5% |
| 4 | 40816 | 3.4% |
| 5 | 39822 | 3.4% |
| M | 37577 | 3.2% |
| 7 | 37163 | 3.1% |
| Other values (55) | 663641 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1188369 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| _ | 114246 | 9.6% |
| 1 | 61386 | 5.2% |
| 0 | 52867 | 4.4% |
| 3 | 50156 | 4.2% |
| 2 | 49524 | 4.2% |
| E | 41171 | 3.5% |
| 4 | 40816 | 3.4% |
| 5 | 39822 | 3.4% |
| M | 37577 | 3.2% |
| 7 | 37163 | 3.1% |
| Other values (55) | 663641 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1188369 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| _ | 114246 | 9.6% |
| 1 | 61386 | 5.2% |
| 0 | 52867 | 4.4% |
| 3 | 50156 | 4.2% |
| 2 | 49524 | 4.2% |
| E | 41171 | 3.5% |
| 4 | 40816 | 3.4% |
| 5 | 39822 | 3.4% |
| M | 37577 | 3.2% |
| 7 | 37163 | 3.1% |
| Other values (55) | 663641 |
Protein_ID
Text
Unique 
| Distinct | 50000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.4 MiB |
Length
| Max length | 90 |
|---|---|
| Median length | 88 |
| Mean length | 26.58144 |
| Min length | 8 |
Unique
| Unique | 50000 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | MGV-GENOME-0377366_94 |
|---|---|
| 2nd row | MGV-GENOME-0228589_3 |
| 3rd row | TemPhD_cluster_54944_50 |
| 4th row | TemPhD_cluster_21940_29 |
| 5th row | uvig_280215_16 |
| Value | Count | Frequency (%) |
| temphd_cluster_30092_4 | 1 | < 0.1% |
| temphd_cluster_6461_44 | 1 | < 0.1% |
| mgv-genome-0377366_94 | 1 | < 0.1% |
| mgv-genome-0228589_3 | 1 | < 0.1% |
| temphd_cluster_54944_50 | 1 | < 0.1% |
| temphd_cluster_21940_29 | 1 | < 0.1% |
| uvig_280215_16 | 1 | < 0.1% |
| temphd_cluster_2820_6 | 1 | < 0.1% |
| uvig_396803_67 | 1 | < 0.1% |
| mgv-genome-0085121_16 | 1 | < 0.1% |
| Other values (49990) | 49990 |
Most occurring characters
| Value | Count | Frequency (%) |
| _ | 163061 | 12.3% |
| 1 | 78268 | 5.9% |
| 2 | 62660 | 4.7% |
| 3 | 61828 | 4.7% |
| 0 | 58559 | 4.4% |
| 4 | 50588 | 3.8% |
| 5 | 48220 | 3.6% |
| 6 | 43743 | 3.3% |
| 7 | 43737 | 3.3% |
| 8 | 41403 | 3.1% |
| Other values (55) | 677005 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1329072 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| _ | 163061 | 12.3% |
| 1 | 78268 | 5.9% |
| 2 | 62660 | 4.7% |
| 3 | 61828 | 4.7% |
| 0 | 58559 | 4.4% |
| 4 | 50588 | 3.8% |
| 5 | 48220 | 3.6% |
| 6 | 43743 | 3.3% |
| 7 | 43737 | 3.3% |
| 8 | 41403 | 3.1% |
| Other values (55) | 677005 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1329072 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| _ | 163061 | 12.3% |
| 1 | 78268 | 5.9% |
| 2 | 62660 | 4.7% |
| 3 | 61828 | 4.7% |
| 0 | 58559 | 4.4% |
| 4 | 50588 | 3.8% |
| 5 | 48220 | 3.6% |
| 6 | 43743 | 3.3% |
| 7 | 43737 | 3.3% |
| 8 | 41403 | 3.1% |
| Other values (55) | 677005 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1329072 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| _ | 163061 | 12.3% |
| 1 | 78268 | 5.9% |
| 2 | 62660 | 4.7% |
| 3 | 61828 | 4.7% |
| 0 | 58559 | 4.4% |
| 4 | 50588 | 3.8% |
| 5 | 48220 | 3.6% |
| 6 | 43743 | 3.3% |
| 7 | 43737 | 3.3% |
| 8 | 41403 | 3.1% |
| Other values (55) | 677005 |
Length
Real number (ℝ)
High correlation 
| Distinct | 1656 |
|---|---|
| Distinct (%) | 3.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 220.27654 |
| Minimum | 21 |
|---|---|
| Maximum | 7694 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 781.2 KiB |
Quantile statistics
| Minimum | 21 |
|---|---|
| 5-th percentile | 46 |
| Q1 | 81 |
| median | 129 |
| Q3 | 217 |
| 95-th percentile | 793.05 |
| Maximum | 7694 |
| Range | 7673 |
| Interquartile range (IQR) | 136 |
Descriptive statistics
| Standard deviation | 293.61812 |
|---|---|
| Coefficient of variation (CV) | 1.3329523 |
| Kurtosis | 46.935516 |
| Mean | 220.27654 |
| Median Absolute Deviation (MAD) | 58 |
| Skewness | 4.9451327 |
| Sum | 11013827 |
| Variance | 86211.603 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 66 | 376 | 0.8% |
| 68 | 375 | 0.8% |
| 71 | 366 | 0.7% |
| 60 | 358 | 0.7% |
| 107 | 356 | 0.7% |
| 93 | 332 | 0.7% |
| 67 | 328 | 0.7% |
| 74 | 327 | 0.7% |
| 70 | 325 | 0.7% |
| 116 | 323 | 0.6% |
| Other values (1646) | 46534 |
| Value | Count | Frequency (%) |
| 21 | 1 | < 0.1% |
| 22 | 2 | < 0.1% |
| 23 | 1 | < 0.1% |
| 24 | 2 | < 0.1% |
| 25 | 6 | < 0.1% |
| 26 | 5 | < 0.1% |
| 27 | 6 | < 0.1% |
| 28 | 6 | < 0.1% |
| 29 | 92 | |
| 30 | 98 |
| Value | Count | Frequency (%) |
| 7694 | 1 | |
| 7324 | 1 | |
| 6121 | 1 | |
| 5721 | 1 | |
| 5089 | 1 | |
| 5055 | 1 | |
| 4711 | 1 | |
| 4439 | 1 | |
| 4421 | 1 | |
| 4289 | 1 |
PredictedTMHsNumber
Real number (ℝ)
High correlation 
| Distinct | 25 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.9024 |
| Minimum | 1 |
|---|---|
| Maximum | 26 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 781.2 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 5 |
| Maximum | 26 |
| Range | 25 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.904906 |
|---|---|
| Coefficient of variation (CV) | 1.0013173 |
| Kurtosis | 28.830601 |
| Mean | 1.9024 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 4.5453767 |
| Sum | 95120 |
| Variance | 3.6286668 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 29439 | |
| 2 | 12185 | |
| 3 | 3685 | 7.4% |
| 4 | 1937 | 3.9% |
| 5 | 702 | 1.4% |
| 6 | 611 | 1.2% |
| 10 | 262 | 0.5% |
| 7 | 242 | 0.5% |
| 8 | 221 | 0.4% |
| 12 | 155 | 0.3% |
| Other values (15) | 561 | 1.1% |
| Value | Count | Frequency (%) |
| 1 | 29439 | |
| 2 | 12185 | |
| 3 | 3685 | 7.4% |
| 4 | 1937 | 3.9% |
| 5 | 702 | 1.4% |
| 6 | 611 | 1.2% |
| 7 | 242 | 0.5% |
| 8 | 221 | 0.4% |
| 9 | 138 | 0.3% |
| 10 | 262 | 0.5% |
| Value | Count | Frequency (%) |
| 26 | 3 | < 0.1% |
| 25 | 1 | < 0.1% |
| 24 | 9 | < 0.1% |
| 22 | 8 | < 0.1% |
| 21 | 4 | < 0.1% |
| 20 | 25 | |
| 19 | 6 | < 0.1% |
| 18 | 43 | |
| 17 | 5 | < 0.1% |
| 16 | 53 |
ExpnumberofAAsinTMHs
Real number (ℝ)
High correlation 
| Distinct | 41719 |
|---|---|
| Distinct (%) | 83.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 42.077003 |
| Minimum | 8.23423 |
|---|---|
| Maximum | 577.09331 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 781.2 KiB |
Quantile statistics
| Minimum | 8.23423 |
|---|---|
| 5-th percentile | 17.47916 |
| Q1 | 20.869492 |
| median | 23.18689 |
| Q3 | 44.360855 |
| 95-th percentile | 112.93182 |
| Maximum | 577.09331 |
| Range | 568.85908 |
| Interquartile range (IQR) | 23.491363 |
Descriptive statistics
| Standard deviation | 44.015965 |
|---|---|
| Coefficient of variation (CV) | 1.0460813 |
| Kurtosis | 28.205171 |
| Mean | 42.077003 |
| Median Absolute Deviation (MAD) | 5.50395 |
| Skewness | 4.4694202 |
| Sum | 2103850.2 |
| Variance | 1937.4051 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 24.87583 | 69 | 0.1% |
| 18.23661 | 66 | 0.1% |
| 36.04048 | 51 | 0.1% |
| 210.43458 | 48 | 0.1% |
| 47.86547 | 42 | 0.1% |
| 108.33627 | 33 | 0.1% |
| 44.96264 | 30 | 0.1% |
| 20.67098 | 30 | 0.1% |
| 71.27387 | 27 | 0.1% |
| 46.4987 | 26 | 0.1% |
| Other values (41709) | 49578 |
| Value | Count | Frequency (%) |
| 8.23423 | 1 | |
| 8.98159 | 1 | |
| 9.06624 | 1 | |
| 9.2583 | 1 | |
| 9.44646 | 1 | |
| 9.6953 | 1 | |
| 9.72277 | 1 | |
| 9.89254 | 1 | |
| 9.90831 | 1 | |
| 10.10174 | 1 |
| Value | Count | Frequency (%) |
| 577.09331 | 1 | |
| 572.18767 | 1 | |
| 561.6315 | 1 | |
| 558.81059 | 1 | |
| 556.48551 | 1 | |
| 554.55197 | 1 | |
| 551.34156 | 1 | |
| 550.00044 | 1 | |
| 546.57729 | 1 | |
| 543.21233 | 1 |
Expnumberfirst60AAs
Real number (ℝ)
Zeros 
| Distinct | 37247 |
|---|---|
| Distinct (%) | 74.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 20.538714 |
| Minimum | 0 |
|---|---|
| Maximum | 49.32205 |
| Zeros | 2544 |
| Zeros (%) | 5.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 781.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 16.485703 |
| median | 21.202375 |
| Q3 | 25.194865 |
| 95-th percentile | 41.458534 |
| Maximum | 49.32205 |
| Range | 49.32205 |
| Interquartile range (IQR) | 8.7091625 |
Descriptive statistics
| Standard deviation | 12.199837 |
|---|---|
| Coefficient of variation (CV) | 0.59399225 |
| Kurtosis | -0.45601502 |
| Mean | 20.538714 |
| Median Absolute Deviation (MAD) | 4.46552 |
| Skewness | -0.10608653 |
| Sum | 1026935.7 |
| Variance | 148.83602 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 2544 | 5.1% |
| 0.00018 | 92 | 0.2% |
| 42.15085 | 73 | 0.1% |
| 24.87583 | 69 | 0.1% |
| 18.23661 | 66 | 0.1% |
| 1 × 10-5 | 53 | 0.1% |
| 0.0002 | 52 | 0.1% |
| 36.04048 | 51 | 0.1% |
| 0.00019 | 48 | 0.1% |
| 0.00015 | 46 | 0.1% |
| Other values (37237) | 46906 |
| Value | Count | Frequency (%) |
| 0 | 2544 | |
| 1 × 10-5 | 53 | 0.1% |
| 2 × 10-5 | 29 | 0.1% |
| 3 × 10-5 | 23 | < 0.1% |
| 4 × 10-5 | 16 | < 0.1% |
| 5 × 10-5 | 10 | < 0.1% |
| 6 × 10-5 | 24 | < 0.1% |
| 7 × 10-5 | 12 | < 0.1% |
| 8 × 10-5 | 26 | 0.1% |
| 9 × 10-5 | 15 | < 0.1% |
| Value | Count | Frequency (%) |
| 49.32205 | 1 | |
| 49.17952 | 1 | |
| 47.95518 | 1 | |
| 47.95169 | 2 | |
| 47.80002 | 1 | |
| 47.8 | 1 | |
| 47.7514 | 1 | |
| 47.69621 | 1 | |
| 47.67108 | 1 | |
| 47.47652 | 1 |
TotalprobofNin
Real number (ℝ)
| Distinct | 30669 |
|---|---|
| Distinct (%) | 61.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.58906106 |
| Minimum | 0 |
|---|---|
| Maximum | 1 |
| Zeros | 4 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 781.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.021739 |
| Q1 | 0.23569 |
| median | 0.691315 |
| Q3 | 0.925905 |
| 95-th percentile | 0.9961705 |
| Maximum | 1 |
| Range | 1 |
| Interquartile range (IQR) | 0.690215 |
Descriptive statistics
| Standard deviation | 0.35347854 |
|---|---|
| Coefficient of variation (CV) | 0.60007114 |
| Kurtosis | -1.4075532 |
| Mean | 0.58906106 |
| Median Absolute Deviation (MAD) | 0.28049 |
| Skewness | -0.37670343 |
| Sum | 29453.053 |
| Variance | 0.12494708 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.99854 | 79 | 0.2% |
| 0.56701 | 69 | 0.1% |
| 0.59265 | 66 | 0.1% |
| 0.03881 | 53 | 0.1% |
| 0.86194 | 51 | 0.1% |
| 0.28216 | 42 | 0.1% |
| 0.99602 | 40 | 0.1% |
| 0.95017 | 35 | 0.1% |
| 0.97286 | 33 | 0.1% |
| 0.99959 | 32 | 0.1% |
| Other values (30659) | 49500 |
| Value | Count | Frequency (%) |
| 0 | 4 | < 0.1% |
| 1 × 10-5 | 3 | < 0.1% |
| 2 × 10-5 | 5 | |
| 3 × 10-5 | 2 | < 0.1% |
| 4 × 10-5 | 10 | |
| 5 × 10-5 | 6 | |
| 6 × 10-5 | 11 | |
| 7 × 10-5 | 2 | < 0.1% |
| 8 × 10-5 | 3 | < 0.1% |
| 9 × 10-5 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 12 | |
| 0.99999 | 19 | |
| 0.99998 | 27 | |
| 0.99997 | 11 | |
| 0.99996 | 16 | |
| 0.99995 | 22 | |
| 0.99994 | 13 | |
| 0.99993 | 15 | |
| 0.99992 | 11 | |
| 0.99991 | 13 |
POSSIBLENterm
Boolean
Constant  Missing 
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 9625 |
| Missing (%) | 19.2% |
| Memory size | 2.1 MiB |
| True | |
|---|---|
| (Missing) |
| Value | Count | Frequency (%) |
| True | 40375 | |
| (Missing) | 9625 | 19.2% |
Insidesource
Categorical
Constant 
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.5 MiB |
| TMHMM2.0 |
|---|
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 8 |
| Min length | 8 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | TMHMM2.0 |
|---|---|
| 2nd row | TMHMM2.0 |
| 3rd row | TMHMM2.0 |
| 4th row | TMHMM2.0 |
| 5th row | TMHMM2.0 |
Common Values
| Value | Count | Frequency (%) |
| TMHMM2.0 | 50000 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| tmhmm2.0 | 50000 |
Most occurring characters
| Value | Count | Frequency (%) |
| M | 150000 | |
| T | 50000 | 12.5% |
| H | 50000 | 12.5% |
| 2 | 50000 | 12.5% |
| . | 50000 | 12.5% |
| 0 | 50000 | 12.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 400000 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| M | 150000 | |
| T | 50000 | 12.5% |
| H | 50000 | 12.5% |
| 2 | 50000 | 12.5% |
| . | 50000 | 12.5% |
| 0 | 50000 | 12.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 400000 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| M | 150000 | |
| T | 50000 | 12.5% |
| H | 50000 | 12.5% |
| 2 | 50000 | 12.5% |
| . | 50000 | 12.5% |
| 0 | 50000 | 12.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 400000 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| M | 150000 | |
| T | 50000 | 12.5% |
| H | 50000 | 12.5% |
| 2 | 50000 | 12.5% |
| . | 50000 | 12.5% |
| 0 | 50000 | 12.5% |
Insidestart
Real number (ℝ)
High correlation 
| Distinct | 1160 |
|---|---|
| Distinct (%) | 2.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 93.70036 |
| Minimum | 1 |
|---|---|
| Maximum | 7674 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 781.2 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 33 |
| Q3 | 86 |
| 95-th percentile | 468 |
| Maximum | 7674 |
| Range | 7673 |
| Interquartile range (IQR) | 85 |
Descriptive statistics
| Standard deviation | 196.1528 |
|---|---|
| Coefficient of variation (CV) | 2.093405 |
| Kurtosis | 126.73208 |
| Mean | 93.70036 |
| Median Absolute Deviation (MAD) | 32 |
| Skewness | 7.025596 |
| Sum | 4685018 |
| Variance | 38475.921 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 16488 | |
| 27 | 1770 | 3.5% |
| 28 | 1641 | 3.3% |
| 33 | 1366 | 2.7% |
| 38 | 1120 | 2.2% |
| 24 | 1071 | 2.1% |
| 22 | 831 | 1.7% |
| 25 | 659 | 1.3% |
| 43 | 604 | 1.2% |
| 23 | 487 | 1.0% |
| Other values (1150) | 23963 |
| Value | Count | Frequency (%) |
| 1 | 16488 | |
| 19 | 15 | < 0.1% |
| 20 | 27 | 0.1% |
| 21 | 8 | < 0.1% |
| 22 | 831 | 1.7% |
| 23 | 487 | 1.0% |
| 24 | 1071 | 2.1% |
| 25 | 659 | 1.3% |
| 26 | 177 | 0.4% |
| 27 | 1770 | 3.5% |
| Value | Count | Frequency (%) |
| 7674 | 1 | |
| 7304 | 1 | |
| 5139 | 1 | |
| 4667 | 1 | |
| 3783 | 1 | |
| 3223 | 2 | |
| 3210 | 1 | |
| 3129 | 1 | |
| 3000 | 1 | |
| 2852 | 1 |
Insideend
Real number (ℝ)
High correlation 
| Distinct | 1254 |
|---|---|
| Distinct (%) | 2.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 140.22592 |
| Minimum | 1 |
|---|---|
| Maximum | 7694 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 781.2 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 6 |
| Q1 | 36 |
| median | 86 |
| Q3 | 153 |
| 95-th percentile | 522 |
| Maximum | 7694 |
| Range | 7693 |
| Interquartile range (IQR) | 117 |
Descriptive statistics
| Standard deviation | 209.16529 |
|---|---|
| Coefficient of variation (CV) | 1.4916307 |
| Kurtosis | 100.12904 |
| Mean | 140.22592 |
| Median Absolute Deviation (MAD) | 56 |
| Skewness | 6.1729031 |
| Sum | 7011296 |
| Variance | 43750.117 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 6 | 3768 | 7.5% |
| 12 | 1354 | 2.7% |
| 4 | 1323 | 2.6% |
| 11 | 928 | 1.9% |
| 20 | 858 | 1.7% |
| 19 | 517 | 1.0% |
| 8 | 460 | 0.9% |
| 1 | 370 | 0.7% |
| 67 | 353 | 0.7% |
| 71 | 336 | 0.7% |
| Other values (1244) | 39733 |
| Value | Count | Frequency (%) |
| 1 | 370 | 0.7% |
| 2 | 52 | 0.1% |
| 4 | 1323 | 2.6% |
| 6 | 3768 | |
| 8 | 460 | 0.9% |
| 10 | 35 | 0.1% |
| 11 | 928 | 1.9% |
| 12 | 1354 | 2.7% |
| 15 | 65 | 0.1% |
| 16 | 176 | 0.4% |
| Value | Count | Frequency (%) |
| 7694 | 1 | |
| 7324 | 1 | |
| 5144 | 1 | |
| 4826 | 1 | |
| 3789 | 1 | |
| 3582 | 2 | |
| 3488 | 1 | |
| 3221 | 1 | |
| 3101 | 1 | |
| 2861 | 1 |
TMhelixsource
Categorical
Constant 
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.5 MiB |
| TMHMM2.0 |
|---|
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 8 |
| Min length | 8 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | TMHMM2.0 |
|---|---|
| 2nd row | TMHMM2.0 |
| 3rd row | TMHMM2.0 |
| 4th row | TMHMM2.0 |
| 5th row | TMHMM2.0 |
Common Values
| Value | Count | Frequency (%) |
| TMHMM2.0 | 50000 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| tmhmm2.0 | 50000 |
Most occurring characters
| Value | Count | Frequency (%) |
| M | 150000 | |
| T | 50000 | 12.5% |
| H | 50000 | 12.5% |
| 2 | 50000 | 12.5% |
| . | 50000 | 12.5% |
| 0 | 50000 | 12.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 400000 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| M | 150000 | |
| T | 50000 | 12.5% |
| H | 50000 | 12.5% |
| 2 | 50000 | 12.5% |
| . | 50000 | 12.5% |
| 0 | 50000 | 12.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 400000 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| M | 150000 | |
| T | 50000 | 12.5% |
| H | 50000 | 12.5% |
| 2 | 50000 | 12.5% |
| . | 50000 | 12.5% |
| 0 | 50000 | 12.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 400000 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| M | 150000 | |
| T | 50000 | 12.5% |
| H | 50000 | 12.5% |
| 2 | 50000 | 12.5% |
| . | 50000 | 12.5% |
| 0 | 50000 | 12.5% |
TMhelixstart
Real number (ℝ)
High correlation 
| Distinct | 1178 |
|---|---|
| Distinct (%) | 2.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 97.82874 |
| Minimum | 2 |
|---|---|
| Maximum | 7651 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 781.2 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 10 |
| median | 37 |
| Q3 | 92 |
| 95-th percentile | 473 |
| Maximum | 7651 |
| Range | 7649 |
| Interquartile range (IQR) | 82 |
Descriptive statistics
| Standard deviation | 197.45472 |
|---|---|
| Coefficient of variation (CV) | 2.0183712 |
| Kurtosis | 123.04498 |
| Mean | 97.82874 |
| Median Absolute Deviation (MAD) | 30 |
| Skewness | 6.932696 |
| Sum | 4891437 |
| Variance | 38988.365 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 7 | 3780 | 7.6% |
| 5 | 3459 | 6.9% |
| 4 | 3296 | 6.6% |
| 10 | 1763 | 3.5% |
| 13 | 1454 | 2.9% |
| 15 | 1210 | 2.4% |
| 20 | 1025 | 2.1% |
| 12 | 932 | 1.9% |
| 21 | 859 | 1.7% |
| 39 | 605 | 1.2% |
| Other values (1168) | 31617 |
| Value | Count | Frequency (%) |
| 2 | 370 | 0.7% |
| 3 | 52 | 0.1% |
| 4 | 3296 | |
| 5 | 3459 | |
| 6 | 269 | 0.5% |
| 7 | 3780 | |
| 9 | 460 | 0.9% |
| 10 | 1763 | |
| 11 | 87 | 0.2% |
| 12 | 932 | 1.9% |
| Value | Count | Frequency (%) |
| 7651 | 1 | |
| 7281 | 1 | |
| 5145 | 1 | |
| 4827 | 1 | |
| 3763 | 1 | |
| 3222 | 1 | |
| 3200 | 2 | |
| 3106 | 1 | |
| 2977 | 1 | |
| 2829 | 1 |
TMhelixend
Real number (ℝ)
High correlation 
| Distinct | 1185 |
|---|---|
| Distinct (%) | 2.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 118.6662 |
| Minimum | 18 |
|---|---|
| Maximum | 7673 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 781.2 KiB |
Quantile statistics
| Minimum | 18 |
|---|---|
| 5-th percentile | 24 |
| Q1 | 31 |
| median | 58 |
| Q3 | 113 |
| 95-th percentile | 495 |
| Maximum | 7673 |
| Range | 7655 |
| Interquartile range (IQR) | 82 |
Descriptive statistics
| Standard deviation | 197.68297 |
|---|---|
| Coefficient of variation (CV) | 1.6658743 |
| Kurtosis | 122.5839 |
| Mean | 118.6662 |
| Median Absolute Deviation (MAD) | 31 |
| Skewness | 6.9184816 |
| Sum | 5933310 |
| Variance | 39078.557 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 29 | 2714 | 5.4% |
| 26 | 2369 | 4.7% |
| 27 | 2332 | 4.7% |
| 24 | 1606 | 3.2% |
| 32 | 1379 | 2.8% |
| 35 | 1073 | 2.1% |
| 23 | 932 | 1.9% |
| 37 | 900 | 1.8% |
| 34 | 842 | 1.7% |
| 42 | 831 | 1.7% |
| Other values (1175) | 35022 |
| Value | Count | Frequency (%) |
| 18 | 6 | < 0.1% |
| 19 | 62 | 0.1% |
| 20 | 31 | 0.1% |
| 21 | 698 | 1.4% |
| 22 | 617 | 1.2% |
| 23 | 932 | 1.9% |
| 24 | 1606 | |
| 25 | 350 | 0.7% |
| 26 | 2369 | |
| 27 | 2332 |
| Value | Count | Frequency (%) |
| 7673 | 1 | |
| 7303 | 1 | |
| 5167 | 1 | |
| 4849 | 1 | |
| 3782 | 1 | |
| 3244 | 1 | |
| 3222 | 2 | |
| 3128 | 1 | |
| 2999 | 1 | |
| 2851 | 1 |
Outsidesource
Categorical
Constant 
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.5 MiB |
| TMHMM2.0 |
|---|
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 8 |
| Min length | 8 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | TMHMM2.0 |
|---|---|
| 2nd row | TMHMM2.0 |
| 3rd row | TMHMM2.0 |
| 4th row | TMHMM2.0 |
| 5th row | TMHMM2.0 |
Common Values
| Value | Count | Frequency (%) |
| TMHMM2.0 | 50000 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| tmhmm2.0 | 50000 |
Most occurring characters
| Value | Count | Frequency (%) |
| M | 150000 | |
| T | 50000 | 12.5% |
| H | 50000 | 12.5% |
| 2 | 50000 | 12.5% |
| . | 50000 | 12.5% |
| 0 | 50000 | 12.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 400000 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| M | 150000 | |
| T | 50000 | 12.5% |
| H | 50000 | 12.5% |
| 2 | 50000 | 12.5% |
| . | 50000 | 12.5% |
| 0 | 50000 | 12.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 400000 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| M | 150000 | |
| T | 50000 | 12.5% |
| H | 50000 | 12.5% |
| 2 | 50000 | 12.5% |
| . | 50000 | 12.5% |
| 0 | 50000 | 12.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 400000 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| M | 150000 | |
| T | 50000 | 12.5% |
| H | 50000 | 12.5% |
| 2 | 50000 | 12.5% |
| . | 50000 | 12.5% |
| 0 | 50000 | 12.5% |
Outsidestart
Real number (ℝ)
High correlation 
| Distinct | 1102 |
|---|---|
| Distinct (%) | 2.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 90.05776 |
| Minimum | 1 |
|---|---|
| Maximum | 5168 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 781.2 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 36 |
| Q3 | 86 |
| 95-th percentile | 429 |
| Maximum | 5168 |
| Range | 5167 |
| Interquartile range (IQR) | 85 |
Descriptive statistics
| Standard deviation | 175.93883 |
|---|---|
| Coefficient of variation (CV) | 1.9536221 |
| Kurtosis | 53.157012 |
| Mean | 90.05776 |
| Median Absolute Deviation (MAD) | 35 |
| Skewness | 5.2738131 |
| Sum | 4502888 |
| Variance | 30954.472 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 12951 | |
| 30 | 3357 | 6.7% |
| 25 | 1747 | 3.5% |
| 36 | 1643 | 3.3% |
| 27 | 1372 | 2.7% |
| 28 | 1315 | 2.6% |
| 44 | 1077 | 2.2% |
| 35 | 1012 | 2.0% |
| 32 | 854 | 1.7% |
| 43 | 648 | 1.3% |
| Other values (1092) | 24024 |
| Value | Count | Frequency (%) |
| 1 | 12951 | |
| 17 | 7 | < 0.1% |
| 18 | 4 | < 0.1% |
| 19 | 6 | < 0.1% |
| 20 | 202 | 0.4% |
| 21 | 69 | 0.1% |
| 22 | 179 | 0.4% |
| 23 | 500 | 1.0% |
| 24 | 114 | 0.2% |
| 25 | 1747 | 3.5% |
| Value | Count | Frequency (%) |
| 5168 | 1 | |
| 4850 | 1 | |
| 3245 | 1 | |
| 2821 | 1 | |
| 2720 | 1 | |
| 2602 | 1 | |
| 2585 | 1 | |
| 2396 | 1 | |
| 2358 | 1 | |
| 2087 | 1 |
Outsideend
Real number (ℝ)
High correlation 
| Distinct | 1634 |
|---|---|
| Distinct (%) | 3.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 176.87936 |
| Minimum | 3 |
|---|---|
| Maximum | 7650 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 781.2 KiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 33 |
| median | 81 |
| Q3 | 184 |
| 95-th percentile | 734 |
| Maximum | 7650 |
| Range | 7647 |
| Interquartile range (IQR) | 151 |
Descriptive statistics
| Standard deviation | 297.3179 |
|---|---|
| Coefficient of variation (CV) | 1.6809078 |
| Kurtosis | 45.684828 |
| Mean | 176.87936 |
| Median Absolute Deviation (MAD) | 62 |
| Skewness | 4.8835224 |
| Sum | 8843968 |
| Variance | 88397.932 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 3 | 3296 | 6.6% |
| 4 | 2136 | 4.3% |
| 9 | 1763 | 3.5% |
| 14 | 1210 | 2.4% |
| 38 | 531 | 1.1% |
| 19 | 508 | 1.0% |
| 33 | 446 | 0.9% |
| 32 | 383 | 0.8% |
| 30 | 369 | 0.7% |
| 39 | 367 | 0.7% |
| Other values (1624) | 38991 |
| Value | Count | Frequency (%) |
| 3 | 3296 | |
| 4 | 2136 | |
| 5 | 269 | 0.5% |
| 6 | 12 | < 0.1% |
| 9 | 1763 | |
| 10 | 52 | 0.1% |
| 11 | 4 | < 0.1% |
| 12 | 100 | 0.2% |
| 14 | 1210 | 2.4% |
| 16 | 12 | < 0.1% |
| Value | Count | Frequency (%) |
| 7650 | 1 | |
| 7280 | 1 | |
| 6121 | 1 | |
| 5721 | 1 | |
| 5089 | 1 | |
| 5055 | 1 | |
| 4711 | 1 | |
| 4439 | 1 | |
| 4421 | 1 | |
| 4289 | 1 |
Phage_source
Categorical
| Distinct | 13 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.3 MiB |
| MGV | |
|---|---|
| GPD | |
| TemPhD | |
| GOV2 | |
| CHVD | |
| Other values (8) |
Length
| Max length | 8 |
|---|---|
| Median length | 3 |
| Mean length | 3.82022 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | MGV |
|---|---|
| 2nd row | MGV |
| 3rd row | TemPhD |
| 4th row | TemPhD |
| 5th row | GPD |
Common Values
| Value | Count | Frequency (%) |
| MGV | 14613 | |
| GPD | 13143 | |
| TemPhD | 7794 | |
| GOV2 | 6868 | |
| CHVD | 3605 | 7.2% |
| GVD | 1400 | 2.8% |
| RefSeq | 767 | 1.5% |
| IGVD | 584 | 1.2% |
| PhagesDB | 551 | 1.1% |
| Genbank | 366 | 0.7% |
| Other values (3) | 309 | 0.6% |
Length
| Value | Count | Frequency (%) |
| mgv | 14613 | |
| gpd | 13143 | |
| temphd | 7794 | |
| gov2 | 6868 | |
| chvd | 3605 | 7.2% |
| gvd | 1400 | 2.8% |
| refseq | 767 | 1.5% |
| igvd | 584 | 1.2% |
| phagesdb | 551 | 1.1% |
| genbank | 366 | 0.7% |
| Other values (3) | 309 | 0.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| G | 36974 | |
| V | 27327 | |
| D | 27131 | |
| P | 21488 | |
| M | 14638 | 7.7% |
| e | 10245 | 5.4% |
| h | 8345 | 4.4% |
| T | 8051 | 4.2% |
| m | 7794 | 4.1% |
| O | 6868 | 3.6% |
| Other values (18) | 22150 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 191011 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| G | 36974 | |
| V | 27327 | |
| D | 27131 | |
| P | 21488 | |
| M | 14638 | 7.7% |
| e | 10245 | 5.4% |
| h | 8345 | 4.4% |
| T | 8051 | 4.2% |
| m | 7794 | 4.1% |
| O | 6868 | 3.6% |
| Other values (18) | 22150 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 191011 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| G | 36974 | |
| V | 27327 | |
| D | 27131 | |
| P | 21488 | |
| M | 14638 | 7.7% |
| e | 10245 | 5.4% |
| h | 8345 | 4.4% |
| T | 8051 | 4.2% |
| m | 7794 | 4.1% |
| O | 6868 | 3.6% |
| Other values (18) | 22150 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 191011 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| G | 36974 | |
| V | 27327 | |
| D | 27131 | |
| P | 21488 | |
| M | 14638 | 7.7% |
| e | 10245 | 5.4% |
| h | 8345 | 4.4% |
| T | 8051 | 4.2% |
| m | 7794 | 4.1% |
| O | 6868 | 3.6% |
| Other values (18) | 22150 |
Interactions
Correlations
| Expnumberfirst60AAs | ExpnumberofAAsinTMHs | Insideend | Insidestart | Length | Outsideend | Outsidestart | Phage_source | PredictedTMHsNumber | TMhelixend | TMhelixstart | TotalprobofNin | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Expnumberfirst60AAs | 1.000 | 0.406 | -0.168 | 0.129 | -0.354 | -0.330 | -0.121 | 0.049 | 0.355 | -0.145 | -0.164 | 0.155 |
| ExpnumberofAAsinTMHs | 0.406 | 1.000 | 0.513 | 0.731 | 0.269 | 0.323 | 0.584 | 0.050 | 0.873 | 0.693 | 0.673 | 0.083 |
| Insideend | -0.168 | 0.513 | 1.000 | 0.782 | 0.525 | 0.122 | 0.289 | 0.024 | 0.530 | 0.666 | 0.659 | -0.259 |
| Insidestart | 0.129 | 0.731 | 0.782 | 1.000 | 0.291 | 0.119 | 0.251 | 0.023 | 0.781 | 0.661 | 0.655 | -0.189 |
| Length | -0.354 | 0.269 | 0.525 | 0.291 | 1.000 | 0.737 | 0.441 | 0.022 | 0.263 | 0.486 | 0.491 | -0.013 |
| Outsideend | -0.330 | 0.323 | 0.122 | 0.119 | 0.737 | 1.000 | 0.730 | 0.018 | 0.308 | 0.611 | 0.623 | 0.237 |
| Outsidestart | -0.121 | 0.584 | 0.289 | 0.251 | 0.441 | 0.730 | 1.000 | 0.028 | 0.619 | 0.762 | 0.764 | 0.280 |
| Phage_source | 0.049 | 0.050 | 0.024 | 0.023 | 0.022 | 0.018 | 0.028 | 1.000 | 0.047 | 0.023 | 0.023 | 0.027 |
| PredictedTMHsNumber | 0.355 | 0.873 | 0.530 | 0.781 | 0.263 | 0.308 | 0.619 | 0.047 | 1.000 | 0.676 | 0.678 | 0.097 |
| TMhelixend | -0.145 | 0.693 | 0.666 | 0.661 | 0.486 | 0.611 | 0.762 | 0.023 | 0.676 | 1.000 | 0.994 | 0.090 |
| TMhelixstart | -0.164 | 0.673 | 0.659 | 0.655 | 0.491 | 0.623 | 0.764 | 0.023 | 0.678 | 0.994 | 1.000 | 0.099 |
| TotalprobofNin | 0.155 | 0.083 | -0.259 | -0.189 | -0.013 | 0.237 | 0.280 | 0.027 | 0.097 | 0.090 | 0.099 | 1.000 |
Missing values
Sample
| Phage_ID | Protein_ID | Length | PredictedTMHsNumber | ExpnumberofAAsinTMHs | Expnumberfirst60AAs | TotalprobofNin | POSSIBLENterm | Insidesource | Insidestart | Insideend | TMhelixsource | TMhelixstart | TMhelixend | Outsidesource | Outsidestart | Outsideend | Phage_source | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1392473 | MGV-GENOME-0377366 | MGV-GENOME-0377366_94 | 107 | 2 | 39.89314 | 39.89314 | 0.99747 | True | TMHMM2.0 | 53.0 | 107.0 | TMHMM2.0 | 33.0 | 52.0 | TMHMM2.0 | 30.0 | 32.0 | MGV |
| 1225655 | MGV-GENOME-0228589 | MGV-GENOME-0228589_3 | 204 | 2 | 45.03472 | 42.87273 | 0.99861 | True | TMHMM2.0 | 62.0 | 204.0 | TMHMM2.0 | 39.0 | 61.0 | TMHMM2.0 | 30.0 | 38.0 | MGV |
| 2065853 | TemPhD_cluster_54944 | TemPhD_cluster_54944_50 | 108 | 1 | 22.15821 | 21.19442 | 0.14212 | True | TMHMM2.0 | 43.0 | 108.0 | TMHMM2.0 | 24.0 | 42.0 | TMHMM2.0 | 1.0 | 23.0 | TemPhD |
| 1828787 | TemPhD_cluster_21940 | TemPhD_cluster_21940_29 | 41 | 1 | 22.60786 | 22.60786 | 0.35682 | True | TMHMM2.0 | 38.0 | 41.0 | TMHMM2.0 | 15.0 | 37.0 | TMHMM2.0 | 1.0 | 14.0 | TemPhD |
| 575773 | uvig_280215 | uvig_280215_16 | 571 | 1 | 22.92261 | 0.00000 | 0.91546 | NaN | TMHMM2.0 | 1.0 | 169.0 | TMHMM2.0 | 170.0 | 192.0 | TMHMM2.0 | 193.0 | 571.0 | GPD |
| 1869556 | TemPhD_cluster_2820 | TemPhD_cluster_2820_6 | 66 | 1 | 19.76984 | 19.76956 | 0.91180 | True | TMHMM2.0 | 1.0 | 6.0 | TMHMM2.0 | 7.0 | 26.0 | TMHMM2.0 | 27.0 | 66.0 | TemPhD |
| 717310 | uvig_396803 | uvig_396803_67 | 183 | 1 | 18.65302 | 18.45494 | 0.74857 | True | TMHMM2.0 | 1.0 | 6.0 | TMHMM2.0 | 7.0 | 25.0 | TMHMM2.0 | 26.0 | 183.0 | GPD |
| 1193825 | MGV-GENOME-0085121 | MGV-GENOME-0085121_16 | 98 | 2 | 42.31297 | 40.90727 | 0.98844 | True | TMHMM2.0 | 62.0 | 98.0 | TMHMM2.0 | 44.0 | 61.0 | TMHMM2.0 | 35.0 | 43.0 | MGV |
| 1524904 | MGV-GENOME-0378116 | MGV-GENOME-0378116_30 | 122 | 2 | 41.78412 | 23.94402 | 0.91384 | True | TMHMM2.0 | 81.0 | 122.0 | TMHMM2.0 | 58.0 | 80.0 | TMHMM2.0 | 44.0 | 57.0 | MGV |
| 1051224 | MGV-GENOME-0357329 | MGV-GENOME-0357329_12 | 169 | 3 | 64.41947 | 27.06577 | 0.30090 | True | TMHMM2.0 | 119.0 | 169.0 | TMHMM2.0 | 96.0 | 118.0 | TMHMM2.0 | 93.0 | 95.0 | MGV |
| Phage_ID | Protein_ID | Length | PredictedTMHsNumber | ExpnumberofAAsinTMHs | Expnumberfirst60AAs | TotalprobofNin | POSSIBLENterm | Insidesource | Insidestart | Insideend | TMhelixsource | TMhelixstart | TMhelixend | Outsidesource | Outsidestart | Outsideend | Phage_source | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 890899 | uvig_582989 | uvig_582989_7 | 299 | 2 | 44.44968 | 30.89670 | 0.99829 | True | TMHMM2.0 | 75.0 | 299.0 | TMHMM2.0 | 55.0 | 74.0 | TMHMM2.0 | 36.0 | 54.0 | GPD |
| 1770451 | TemPhD_cluster_13199 | TemPhD_cluster_13199_51 | 117 | 3 | 65.10179 | 29.44552 | 0.99441 | True | TMHMM2.0 | 75.0 | 93.0 | TMHMM2.0 | 94.0 | 116.0 | TMHMM2.0 | 117.0 | 117.0 | TemPhD |
| 128492 | Ma_2019_SRR413601_NODE_1094_length_12962_cov_2.987216 | Ma_2019_SRR413601_NODE_1094_length_12962_cov_2.987216_6 | 152 | 1 | 17.47566 | 17.46996 | 0.16250 | True | TMHMM2.0 | 29.0 | 152.0 | TMHMM2.0 | 10.0 | 28.0 | TMHMM2.0 | 1.0 | 9.0 | GVD |
| 434484 | uvig_176237 | uvig_176237_10 | 224 | 2 | 34.75602 | 22.97046 | 0.50513 | True | TMHMM2.0 | 33.0 | 52.0 | TMHMM2.0 | 53.0 | 75.0 | TMHMM2.0 | 76.0 | 224.0 | GPD |
| 809749 | uvig_492895 | uvig_492895_10 | 91 | 2 | 41.70662 | 41.19610 | 0.99907 | True | TMHMM2.0 | 59.0 | 91.0 | TMHMM2.0 | 41.0 | 58.0 | TMHMM2.0 | 27.0 | 40.0 | GPD |
| 2251830 | SAMN05414905_a1_ct51712_vs1 | SAMN05414905_a1_ct51712_vs1_1 | 111 | 1 | 29.94583 | 9.53467 | 0.81494 | NaN | TMHMM2.0 | 1.0 | 56.0 | TMHMM2.0 | 57.0 | 79.0 | TMHMM2.0 | 80.0 | 111.0 | CHVD |
| 24239 | NC_042134.1 | YP_009625909.1 | 67 | 2 | 42.62807 | 42.62770 | 0.98902 | True | TMHMM2.0 | 56.0 | 67.0 | TMHMM2.0 | 33.0 | 55.0 | TMHMM2.0 | 30.0 | 32.0 | RefSeq |
| 2734654 | Station102_MES_ALL_assembly_NODE_272_length_47729_cov_28.559886 | Station102_MES_ALL_assembly_NODE_272_length_47729_cov_28.559886_53 | 121 | 1 | 19.79574 | 0.00006 | 0.92093 | NaN | TMHMM2.0 | 1.0 | 87.0 | TMHMM2.0 | 88.0 | 110.0 | TMHMM2.0 | 111.0 | 121.0 | GOV2 |
| 1551990 | MGV-GENOME-0364654 | MGV-GENOME-0364654_31 | 111 | 1 | 17.60196 | 17.59891 | 0.02257 | True | TMHMM2.0 | 33.0 | 111.0 | TMHMM2.0 | 15.0 | 32.0 | TMHMM2.0 | 1.0 | 14.0 | MGV |
| 2152546 | TemPhD_cluster_6461 | TemPhD_cluster_6461_44 | 225 | 1 | 22.98798 | 0.00000 | 0.51573 | NaN | TMHMM2.0 | 184.0 | 225.0 | TMHMM2.0 | 161.0 | 183.0 | TMHMM2.0 | 1.0 | 160.0 | TemPhD |